In [1]:

    
import random
import timeit
from collections.abc import Sequence
import pandas as pd



In [2]:

    
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt

On a hunch, I made two separate cases in this implementation of mean. I wanted to support any iterable, which the second case does. But, if we have a len method, we can use the built-in sum function.



In [3]:

    
def mean(a):
    if isinstance(a, Sequence):
        return sum(a)/float(len(a))
    else:
        s = n = 0
        for x in a:
            s += x
            n += 1
        return s/float(n)

1st case



In [4]:

    
mean([random.random() for i in range(100000)])









    Out[4]:





0.5010230416561832

2nd case



In [5]:

    
mean(random.random() for i in range(100000))









    Out[5]:





0.4990815515554537

But, is it really worth it to have a separate implementation of the mean function for Sequences? Let's try it and see.



In [6]:

    
def mean_loop(a):
    s = n = 0
    for x in a:
        s += x
        n += 1
    return s/float(n)

def mean_seq(a):
    return sum(a)/float(len(a))



In [7]:

    
n=1000
sizes = [1000,2000,3000,5000,10000,20000,50000,70000,100000]
cases = [('loop',  'mean_loop(a)'),
         ('sum',   'mean_seq(a)'),
         ('if',    'mean(a)')]
df = pd.DataFrame(index=sizes, columns=('sizes',)+tuple(key for key,cmd in cases))
df.sizes = sizes

Now, time a bunch of runs of the mean function on sequences of different sizes. This takes 30 seconds or so.



In [8]:

    
for size in sizes:
    a = tuple(random.random() for i in range(size))
    for key, cmd in cases:
        t = timeit.timeit(cmd, number=n, globals=globals())
        df.set_value(size, key, t)
df



In [9]:

    
import seaborn as sns
import matplotlib.pyplot as plt
ax = sns.regplot(x='sizes', y='loop', data=df, label='loop')
ax = sns.regplot(x='sizes', y='sum', data=df, label='sum')
ax = sns.regplot(x='sizes', y='if', data=df, label='if')
plt.ylabel('seconds')
plt.xlabel('sequence length')
plt.title('Running time to find mean of a sequence {} times'.format(n))
plt.legend(loc='upper left')









    Out[9]:





<matplotlib.legend.Legend at 0x1135551d0>

For non-humongous sizes of sequences, both implementations will be practically instantaneous. But, we do see a substantial speedup percentage-wise, so why not get it if we can. 😁

	sizes	loop	sum	if
1000	1000	0.0921272	0.00600786	0.00690552
2000	2000	0.192143	0.011381	0.0122312
3000	3000	0.271834	0.0170192	0.0208895
5000	5000	0.46201	0.033739	0.0492685
10000	10000	1.13335	0.078896	0.0770089
20000	20000	2.35542	0.10861	0.110705
50000	50000	4.60694	0.258446	0.270928
70000	70000	7.27327	0.56273	0.544876
100000	100000	10.0002	0.555258	0.606015